Artificial intelligence visual art, or AI art, is visual artwork generated or enhanced through the implementation of artificial intelligence (AI) programs, most commonly using text-to-image models. The process of automated art-making has existed since antiquity. The field of artificial intelligence was founded in the 1950s, and artists began to create art with artificial intelligence shortly after the discipline's founding. A select number of these creations have been showcased in museums and have been recognized with awards. Throughout its history, AI has raised many philosophical questions related to the Mind, , and the nature of art in human–AI collaboration.
During the AI boom of the 2020s, text-to-image models such as Midjourney, DALL-E and Stable Diffusion became widely available to the public, allowing users to quickly generate imagery with little effort. Commentary about AI art in the 2020s has often focused on issues related to copyright, deception, defamation, and its impact on more traditional artists, including technological unemployment.
Also in the 19th century, Ada Lovelace, wrote that "computing operations" could potentially be used to generate music and poems.Natale, S., & Henrickson, L. (2022). The Lovelace Effect: Perceptions of Creativity in Machines. White Rose Research Online. Retrieved September 24, 2024, from https://eprints.whiterose.ac.uk/182906/6/NMS-20-1531.R2_Proof_hi%20%282%29.pdf Lovelace, A. (1843). Notes by the translator. Taylor's Scientific Memoirs, 3, 666-731. In 1950, Alan Turing's paper "Computing Machinery and Intelligence" focused on whether machines can mimic human behavior convincingly. Shortly after, the academic discipline of artificial intelligence was founded at a research workshop at Dartmouth College in 1956.
Since its founding, AI researchers have explored philosophical questions about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction, and philosophy since antiquity.
One of the first significant AI art systems is AARON, developed by Harold Cohen beginning in the late 1960s at the University of California at San Diego. AARON uses a symbolic rule-based approach to generate technical images in the era of GOFAI programming, and it was developed by Cohen with the goal of being able to code the act of drawing.
Karl Sims has exhibited art created with artificial life since the 1980s. He received an M.S. in computer graphics from the MIT Media Lab in 1987 and was artist-in-residence from 1990 to 1996 at the supercomputer manufacturer and artificial intelligence company Thinking Machines. In both 1991 and 1992, Sims won the Golden Nica award at Prix Ars Electronica for his videos using artificial evolution. In 1997, Sims created the interactive artificial evolution installation Galápagos for the NTT InterCommunication Center in Tokyo. Sims received an Emmy Award in 2019 for outstanding achievement in engineering development.
]]In 1999, Scott Draves and a team of several engineers created and released Electric Sheep as a free software screensaver. Electric Sheep is a volunteer computing project for animating and evolving , which are distributed to networked computers that display them as a screensaver. The screensaver used AI to create an infinite animation by learning from its audience. In 2001, Draves won the Fundacion Telefónica Life 4.0 prize for Electric Sheep.
In 2014, Stephanie Dinkins began working on Conversations with Bina48. For the series, Dinkins recorded her conversations with BINA48, a social robot that resembles a middle-aged black woman. In 2019, Dinkins won the Creative Capital award for her creation of an evolving artificial intelligence based on the "interests and culture(s) of people of color."
In 2015, Sougwen Chung began Mimicry (Drawing Operations Unit: Generation 1), an ongoing collaboration between the artist and a robotic arm. In 2019, Chung won the Lumen Prize for her continued performances with a robotic arm that uses AI to attempt to draw in a manner similar to Chung. , created with a generative adversarial network in 2018]]In 2018, an auction sale of artificial intelligence art was held at Christie's in New York where the AI artwork Edmond de Belamy'' sold for , which was almost 45 times higher than its estimate of –10,000. The artwork was created by Obvious, a Paris-based collective.
In 2024, Japanese film generAIdoscope was released. The film was co-directed by Otsuichi, Takeshi Sone, and Hiroki Yamaguchi. All video, audio, and music in the film were created with artificial intelligence.
In 2025, the Japanese anime television series Twins Hinahima was released. The anime was produced and animated with AI assistance during the process of cutting and conversion of photographs into anime illustrations and later retouched by art staff. Most of the remaining parts such as characters and logos were hand-drawn with various software.
In 2014, Ian Goodfellow and colleagues at Université de Montréal developed the generative adversarial network (GAN), a type of deep neural network capable of learning to mimic the statistical distribution of input data such as images. The GAN uses a "generator" to create new images and a "discriminator" to decide which created images are considered successful. Unlike previous algorithmic art that followed hand-coded rules, generative adversarial networks could learn a specific Aesthetics by analyzing a Data set of example images.
In 2015, a team at Google released DeepDream, a program that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia. The process creates deliberately over-processed images with a dream-like appearance reminiscent of a psychedelic experience. Later, in 2017, a conditional GAN learned to generate 1000 image classes of ImageNet, a large visual database designed for use in visual object recognition software research. By conditioning the GAN on both random noise and a specific class label, this approach enhanced the quality of image synthesis for class-conditional models.
Autoregressive models were used for image generation, such as PixelRNN (2016), which autoregressively generates one pixel after another with a recurrent neural network. Immediately after the Transformer architecture was proposed in Attention Is All You Need (2018), it was used for autoregressive generation of images, but without text conditioning.
The website Artbreeder, launched in 2018, uses the models StyleGAN and BigGAN to allow users to generate and modify images such as faces, landscapes, and paintings.
In the 2020s, text-to-image models, which generate images based on prompts, became widely used, marking yet another shift in the creation of AI-generated artworks.
In 2021, using the influential large language generative pre-trained transformer models that are used in GPT-2 and GPT-3, OpenAI released a series of images created with the text-to-image AI model DALL-E. It is an autoregressive generative model with essentially the same architecture as GPT-3. Along with this, later in 2021, EleutherAI released the open source VQGAN-CLIP based on OpenAI's CLIP model. , generative models used to create synthetic data based on existing data, were first proposed in 2015, but they only became better than GANs in early 2021. Latent diffusion model was published in December 2021 and became the basis for the later Stable Diffusion (August 2022), developed through a collaboration between Stability AI, CompVis Group at Ludwig Maximilian University of Munich, and Runway.
In 2022, Midjourney was released, followed by Google Brain's Imagen and Parti, which were announced in May 2022, Microsoft's NUWA-Infinity, and the source-available Stable Diffusion, which was released in August 2022. DALL-E2, a successor to DALL-E, was beta-tested and released (with the further successor DALL-E3 being released in 2023). Stability AI has a Stable Diffusion web interface called DreamStudio, plugins for Krita, Photoshop, Blender, and GIMP, and the Automatic1111 web-based open source user interface. Stable Diffusion's main pre-trained model is shared on the Hugging Face Hub.
Ideogram was released in August 2023, this model is known for its ability to generate legible text.
In 2024, Flux was released. This model can generate realistic images and was integrated into Grok, the chatbot used on Twitter, and Le Chat, the chatbot of Mistral AI. Flux was developed by Black Forest Labs, founded by the researchers behind Stable Diffusion. Grok later switched to its own text-to-image model Aurora in December of the same year. Several companies, along with their products, have also developed an AI model integrated with an image editing service. Adobe has released and integrated the AI model Adobe Firefly into Premiere Pro, Adobe Photoshop, and Illustrator. Microsoft has also publicly announced AI image-generator features for Microsoft Paint. Along with this, some examples of text-to-video models of the mid-2020s are Runway's Gen-4, Google's VideoPoet, OpenAI's Sora, which was released in December 2024, and LTX-2 which was released in 2025.
In 2025, several models were released. GPT Image 1 from OpenAI, launched in March 2025, introduced new text rendering and multimodal capabilities, enabling image generation from diverse inputs like sketches and text. Midjourney debuted in April 2025, providing improved text prompt processing. In May 2025, Flux.1 Kontext by Black Forest Labs emerged as an efficient model for high-fidelity image generation, while Google Imagen 4 was released with improved photorealism. Flux.2 debuted in November 2025 with improved image reference, typography, and prompt understanding.
In addition, procedural "rule-based" image generation techniques have been developed, utilizing mathematical patterns, algorithms that simulate brush strokes and other painterly effects, as well as deep learning models such as generative adversarial networks (GANs) and transformers. Several companies have released applications and websites that allow users to focus exclusively on positive prompts, bypassing the need for manual configuration of other parameters. There are also programs capable of transforming photographs into stylized images that mimic the aesthetics of well-known painting styles.
There are many options, ranging from simple consumer-facing mobile apps to Project Jupyter notebooks and web UIs that require powerful GPUs to run effectively. Additional functionalities include "textual inversion," which refers to enabling the use of user-provided concepts (like an object or a style) learned from a few images. Novel art can then be generated from the associated word(s) (the text that has been assigned to the learned, often abstract, concept) and model extensions or fine-tuning (such as DreamBooth).
Two computational methods, close reading and distant viewing, are the typical approaches used to analyze digitized art. Close reading focuses on specific visual aspects of one piece. Some tasks performed by machines in close reading methods include computational artist authentication and analysis of brushstrokes or texture properties. In contrast, through distant viewing methods, the similarity across an entire collection for a specific feature can be statistically visualized. Common tasks relating to this method include automatic classification, object detection, multimodal tasks, knowledge discovery in art history, and computational aesthetics. Synthetic images can also be used to train AI algorithms for art authentication and to detect forgeries.
Researchers have also introduced models that predict emotional responses to art. One such model is ArtEmis, a large-scale dataset paired with machine learning models. ArtEmis includes emotional annotations from over 6,500 participants along with textual explanations. By analyzing both visual inputs and the accompanying text descriptions from this dataset, ArtEmis enables the generation of nuanced emotional predictions.
According to a research study from the National Library of Medicine, humans inherently show a bias against artwork described as being AI-generated. When participants of the study were shown two comparable images, with only one presented as having been generated by AI, subjects were more likely to rate the one described as being artificially generated lower in artistic value. This suggests that social and cultural attitudes can shape the determination of whether an image is considered art, regardless of the image's other visual features.
In a 2023 report submitted to the Annual Convention of Digital Art Observers, Samuel Loomis wrote that the term "AI art" acknowledges its dual nature as a product of human guidance and machine-driven generative systems, when evaluating it by the same critical standards applied to traditional art.Jonathan Doe: "A Summary and Analysis of Contemporary Digital Media Trends", published in Die Zeitung (February 2024)
|
|